MGLEP: Multimodal Graph Learning for Modeling Emerging Pandemics with Big Data

Accurate forecasting and analysis of emerging pandemics play a crucial role in effective public health management and decision-making. Traditional approaches primarily rely on epidemiological data, overlooking other valuable sources of information that could act as sensors or indicators of pandemic patterns. In this paper, we propose a novel framework, MGLEP, that integrates temporal graph neural networks and multi-modal data for learning and forecasting. We incorporate big data sources, including social media content, by utilizing specific pre-trained language models and discovering the underlying graph structure among users. This integration provides rich indicators of pandemic dynamics through learning with temporal graph neural networks. Extensive experiments demonstrate the effectiveness of our framework in pandemic forecasting and analysis, outperforming baseline methods across different areas, pandemic situations, and prediction horizons. The fusion of temporal graph learning and multi-modal data enables a comprehensive understanding of the pandemic landscape with less time lag, cheap cost, and more potential information indicators.


Introduction
Pandemics are global outbreaks of infectious diseases that affect many people across continents.The COVID-19 pandemic is one of the most significant pandemics of our time, impacting millions of individuals worldwide and causing lasting effects on our society.In order to combat pandemics, it is crucial to develop efficient solutions that facilitate the comprehension of their transmission and containment.This requires tracking and evaluating the evolution of pandemics through efficient monitoring and analysis of online resources that provide rich information, reflecting public knowledge and perceptions in a timely manner.For instance, the volumes of social media interests can serve as early indicators of COVID-19 waves 1,2 , and users' content can unveil diverse perspectives on regulations, such as quarantine measures or vaccination strategies.Understanding these signals can help policymakers to combat pandemics by recognizing trends in their spread and impact on the population, as well as the efficacy of current countermeasures.
Traditional pandemic monitoring involves tracking hospital admissions, laboratory testing, and death rates, but can be expensive and lag in providing real-time disease spread updates.Compartmental models like SIR 3 and statistical analyses such as ARIMA 4 and Prophet 5 use past data for predictions, are common approaches.However, these statistical models rely on assumptions and might lack data to precisely estimate factors like reproduction number in pandemic planning and forecasting.patterns and trends that can help predict future outbreaks through historical statistics.By using deep learning algorithms, we can analyze large amounts of data quickly and accurately, making it easier to identify patterns and trends that might be missed by other methods.Recent works leveraged deep learning-based methods to learn statistics from earlier time stamps as prediction to forecast the COVID-19 pandemic incidence, achieving better performance comparing to traditional methods 7 .

Support Goverment Response Oppose Goverment Response
One of the limitations of previous works in pandemic forecasting is that they frequently rely entirely on epidemiological data and ignore other information that might act as sensors or indicators of the pandemic's patterns and evolution.Data from search engines, for example, can be used to monitor how individuals are looking for information about pandemics 8,9 .More crucially, social media data can be utilized to monitor how people are reacting to and feeling about a pandemic 10,11 .Although previous research has examined the connection between social media usage and pandemic trends 12 , there has been little use of deep learning techniques to predict and track the spread of the epidemic.We can gain a more complete view of how an epidemic is evolving and how effective various treatments might be by including external knowledge from social media into pandemic forecasting models.For instance, by monitoring social media data, we may pinpoint public health campaigns to regions where people are most worried about a pandemic.Fig. 1 illustrates our motivated examples, which show different stances of social media users on the COVID-19 pandemic and on government regulations.
During pandemics, social media has emerged as a key source of information, offering real-time updates on the spread of diseases and people's responses to them.Therefore, we investigate how pandemic tracking and analysis using deep learning algorithms can benefit from external knowledge from multi-modality, including social media and government regulations.More specifically, we construct graph-structured data from social media, treating each user as a node representing the current epidemic status.We dynamically capture interactions between users using temporal graph learning.Graph learning, particularly Graph Neural Networks (GNNs) 13,14 , is an important branch of machine learning that deals with learning from and representing a variety of real-world relational data, including citation networks, social networks, knowledge graphs, etc. Incorporating graph learning techniques and graph-structured representations offers a promising approach to overcome limitations of previous works in pandemic forecasting, where graphs can capture the structural and semantic information of the pandemic domain [15][16][17] .
In this work, we introduce MGL4MEP, a neural framework for forecasting and analyzing developing pandemics using big data sources and deep learning methods, including graph neural networks.We utilize the extremely recent COVID-19 pandemic and their effects on multiple areas as a case study.In order to trace and predict the evolution of the pandemic, we investigate the relationship between the pandemic risk factors and all other relevant data sources such as social media.Our framework will support many end users like politicians, policy makers, and general population for reference by providing complementary analysis and forecast information, leading to more effective crisis preventive and reaction times.Our contributions in this work are summarized as follows: • We propose a multi-modal neural framework named MGL4MEP for COVID-19 pandemic tracking and prediction, • We extract and combine data from multiple sources, including social signals and government stringency signals as additional indicators to monitor pandemic trends and predict future evolution, • We investigate the correlation and impacts of these multi-modal data on pandemic forecasting using deep learning and graph learning methods, • We conduct extensive experiments on multiple areas affected by the pandemics to show the usefulness and effectiveness of our proposed framework.
Source codes of our framework and reproducible baselines are made publicly available at https://github.com/KhanhTungTran/MGL4MEP for future research and benchmarking purposes.

Baselines
We evaluate our proposed approach against several baselines that employ different techniques, including statistical, machine learning, and deep learning approaches.
• Numerical analysis: (i) AVG: average of the whole history are used to predict the future; (ii) AVG_WINDOW: average statistics of current prediction window are utilized to predict the future; and (iii) LAST DAY: the statistics of the current day are used as prediction.
• Machine learning-based models: (i) LIN_REG: Ordinary least squares Linear Regression fits a line to training samples for predicting future cases; (ii) GP_REG: Gaussian Process Regressor is a non-parametric regression model utilizes Gaussian processes; (iii) RAND_FOREST and (iv) XGBOOST: tree-based models.
• Statistical models: (i) ARIMA 4 , a simple autoregressive moving average model leverage the entire history sequence as input; and (ii) PROPHET 5 , similar to ARIMA but with strong seasonality characteristics.
• Deep learning models without graph topology: (i) A straightforward LSTM model, uses the sequence of the most recent d days as its input, (ii) SE trans f ormer and (iii) SRE trans f ormer , baseline models using the popular transformer architecture for learning on extracted text embeddings from social media data.Self-attention is calculated between tokens of different users (extracted using the same pre-trained language model as MGL4MEP models).Then, the final embeddings are fused with LSTM for processing the time-series and making the final predictions.
• MGL4MEP SR , MGL4MEP SE , MGL4MEP SRE : our proposed models (i) MGL4MEP SR are baseline LSTM with an additional input features of regulations (R) information, (ii) MGL4MEP SE is the model with temporal graph learning on pre-processed social media data (E -entity) from 1500 users for the default setting, and (iii) MGL4MEP SRE is our final model with input from three different modality, including statistics as in traditional model, and regulations and social media data.

Implementation details
The proposed framework was implemented in PyTorch 18 , and experiments were carried out on an NVIDIA 3090Ti GPU.We train the model for a maximum of 300 epochs with early stopping.All models are optimized with AdamW optimizer 19 , 10 where y i and ŷi denote the ith statistics from ground truth data and predicted values of the models, and n is the total number of samples in the test set.The MAE metric indicates the average variance between the predicted values and the ground truth in the dataset (lower is better).
The RMSE metric in Equation 2 is the standard deviation of the residuals (prediction error) (lower is better).
The MAPE given in Equation 3 tells us about the mean of the total percentage errors (lower is better).
Finally, the Coefficient of Determination (R-squared metric) provides an insight into the similarity between real and predicted data, where the closer to 1 the R squared value is, the better.Here, RSS = ∑ n i=1 (y i − ŷi ) 2 and T SS = ∑ n i=1 (y i − y) 2 denote the Residual Sum of Squares and the Total Sum of Squares, respectively.

Results
The evaluation results for short-term predictions are presented in Table 1, where it can be observed that for horizon values of 1 or 3, our proposed approaches underperform the baseline LSTM neural network method, which is the closest competitor to our proposed models.However, the performance of our models significantly improve on the New York state dataset, as shown in Table 2.The reason for this improvement may lie in the fact that the R 2 score, which is calculated on the predictions for each state, being close to perfection (0.9820 for the horizon of 1) on the California dataset, but the R 2 score on New York dataset is significantly lower, indicating a higher level of difficulty for learning and prediction on the latter dataset.In general, the MGL4MEP SE method achieves the most impressive results for short-term forecasting, followed by the MGL4MEP SRE model.This can be explained as the time lag between government efforts and their impact on the real-world situation spans multiple weeks.Incorporating this information for short-term prediction may adversely affect the effectiveness of the model.Furthermore, upon comparison with SE trans f ormer and SRE trans f ormer , which utilize the transformer architecture without correlation matrices for processing social media data, we observe that our methodology incorporating graph neural networks featuring spatial-temporal characteristics distinctively surpass these approaches.This outcome highlights the efficacy and suitability of our novel proposed approaches, both in constructing input graph structures and in learning algorithms, for addressing these types of multi-modality domains.
With respect to long-term prediction, our models achieve the best results across all three horizons with significant gaps compared to all other methods, as illustrated in Table 3 and 4. Generally, the best performed approaches are MGL4MEP SRE and MGL4MEP SE models, achieving 42.47%, 34.21%, and 10.62% lower MAE with horizon equals 14, 21, 28 days ahead than the best basline methods for California dataset, and 11.94%, and 7.50% lower MAE for horizon 14 and 21, on New York dataset.Moreover, for long-term prediction, MGL4MEP SR models are able to obtain better results than the simple baseline LSTM models, compared to the results and analysis on short-term predictions.The model's capability on forecasting the long-term trajectory of the pandemic means that it can provide valuable information and insights for government and policy makers on planning ahead and making informed, timely response to the pandemic.
We conduct comprehensive ablation studies to investigate the impact of the size of the input social media graph on our COVID-19 prediction model.In particular, we reduce the number of users selected for building the graph from the original 1,500 users to 1,000 and 500, respectively.Our hypothesis is that as we decrease the number of users, the amount of information provided by the social media data would be significantly reduced.The results in Table 5 for the California region confirm our hypothesis, showing a degradation in performance with a decrease in the number of nodes for both short-term and long-term forecasting.The findings have implications for future research in that it is critical to take the size of the input social media network into account when developing a model for predicting COVID-19 instances.The size of the social media graph directly influences the richness and diversity of the data captured, allowing the model to capture a more nuanced understanding of Table 6 illustrates the ablation study on different number of nodes for the input social media graph of our COVID-19 prediction model for New York dataset.The results are consistent with experiment on California dataset, where a degradation in performance with a decrease in the number of nodes for both short-term and long-term forecasting can be seen.In order to assess the usefulness and effectiveness of our proposed methods in different stages of the pandemic, we perform an additional experiment by collecting data for another 150 days, thereby increasing the total amount of data collected.We then train and evaluate the same models and baselines on this new dataset, the results of which are presented in Table 7 and visualized in Fig. 2. It is worth noting that this new test set for California state exhibits a higher variance compared to the previous forecasting range, where statistics gradually decrease with sharp changes.Nevertheless, our models outperform the baselines and achieve the best performance on both test sets.These results indicate the robustness and generalizability of our approaches in combating the COVID-19 pandemic in different stages of its evolution.

Discussion
Forecasting results.In the real world, COVID-19 is a complex pandemic, and many factors that cannot be seen by previous statistics can lead to different future scenarios.For example, if a government implements a strict lockdown, the number of new cases will likely decrease.However, if a government lifts all restrictions, the number of new cases will likely increase.This is why it is important to consider multiple information sources when forecasting the spread of COVID-19.Our proposed method, MGL4MEP and its variants incorporate multiple information sources effectively, leading to better performances, lower errors, and sustained accuracy, particularly in long-term predictions, compared to other popular forecasting models.MGL4MEP enjoys these benefits due to it being able to learn the dynamic relationships between various factors that affect the spread of COVID-19, such as official government policy, and social stances against the pandemic or situation.Moreover, our ablation results clearly demonstrate that the availability of more information significantly enhances the reliability of our forecasting models.Comparisons between our methods and baselines that do not utilize the graph structure of social media data also highlight the efficacy of our graph-structure generation process and temporal graph learning framework.The results not only underline the effectiveness in learning and extracting information of the models, but also emphasize on the usefulness of input multimodal data.Additionally, our experiment results show that MGL4MEP is adaptable and robust to different situations and history.This means that it can be used to forecast the spread of COVID-19 in different countries and regions, even if the pandemic is evolving rapidly.The predictions made by MGL4MEP can be leverage by different beneficial group such as the authorities to develop appropriate strategies in order to deal with the spread of this pandemic.An example is that MGL4MEP can be used to forecast the impact of different government policies on the spread of the virus.
Finally, it's important to highlight the automated nature of the forecasting process with MGL4MEP.The entire process can be automated and seamlessly updated whenever new information becomes available.This automation is possible due to our framework's reliance on openly accessible data from the Internet, which can be efficiently gathered through automated web crawling.Model limitations.Although some of the proposed models performed well according to certain metrics, we found several shortcomings in the models that we tested.One limitation is that MGL4MEP takes time to realize the trend depending on the dynamic at the considered area, as information from different sources, such as the effects of government policies takes time to be reflected in the data.For example, in the case of New York state, it takes effect immediately while in the case of California, it takes about 14 days.Additionally, the underlying factors that affect the infection of COVID-19 are diverse, and it can be difficult to capture all of them through the multiple data sources used by MGL4MEP.Another limitation of MGL4MEP is that similar to other deep learning methods, it is a black-box model.This means that we cannot easily understand how it makes its predictions, and can make it difficult to trust or explain the model's predictions.
Future research directions.There are several ways in which MGL4MEP can be improved in the future.An interesting future research direction is to enrich our framework with more information regarding the pandemic situation, such as regional age population, mobility, or virus variants.Another direction can be interested in explainability methods, such as identifying important nodes or features through temporal graph learning, or understanding the most valuable factors that affect the forecasting results.This would make us more confident in the predictions of the model and would help us to better understand the dynamics of the pandemic.

Pandemic forecasting
Traditional approaches leverage statistical models have been widely used to forecast COVID-19.These approaches involve analyzing past epidemic data using statistical and time-series methodologies to identify patterns and trends, which can then be used to forecast future outbreaks.Methods like autoregressive integrated moving average (ARIMA) 4 and Prophet 5 are effective at identifying trends in stationary time-series data and handling periodic patterns, respectively.One alternative to forecast pandemics, such as COVID-19 pandemic, is through the use of compartmental models [20][21][22] .These models divide a population into compartments, such as susceptible, exposed, and infected individuals (SIR model), and use mathematical equations to describe the dynamic and transitions between each group.However, it is important to acknowledge that leveraging statistical models for pandemic forecasting has its limitations, as they presumptively assume a linear relationship between past and future time-series.Such methods rely on certain assumptions and may lack the data necessary to accurately address all the relevant issues.
Deep learning have been applied to make predictions about the spread of COVID-19 pandemic and achieve high performances 7,23 .With the enormous dataset of records such as infected and hospitalized cases collected on a daily basis, deep learning is considered a suitable approach, as neural networks can learn and update from data effectively.Sequential models such as Recurrent Neural Network (RNN) and Long Short Term Memory (LSTM) 24 have been applied and seen high performances for forecasting COVID-19 pandemic, both at world-wide or country level 25,26 , and more fine-grained levels including state and county levels [27][28][29][30] .Unlike conventional approaches, deep learning can incorporate external knowledge and adapt to changing circumstances, improving their predictive capabilities.These approaches, or fusion of multiple neural network models, can incorporate a wide range of data sources, including social media, to provide a more comprehensive view of the pandemic and its potential impacts.
To our best knowledge, previous research has only attempted to integrate basic indicators or indices, overlooking the dynamic, intricate information contained in user-generated content 8,12,31 .Incorporating these valuable signals into neural networks remains a challenge but has the potential to provide a more comprehensive view of the pandemic and its potential impacts 1,9,32 .

Leveraging external resources for time-series forecasting
Previous studies have explored the connection between social media interests and pandemic trends.In 9 , the authors highlight a strong correlation between peak of search volume on COVID-19 pandemic and the development of the pandemic, upto 20 days earlier than the issuance of official warnings.The authors of 1 also discovered a close connection between the evolution of the COVID-19 crisis and social media user's sentiments toward different phases of the pandemic.Another work 33 makes use of the social impact of media coverage to support the compartment model for pandemic prediction.Post-processed indicators such as internal movement index and economic response have been incorporated as additional input features to sequence models for forecasting future statistics 34,35 .Differing from them, our method considers every aspect of user response through social media and government regulations against the pandemic.To achieve greater accuracy in pandemic forecasting, we analyze individual tweets and search for relevant social events.
There has been a significant amount of interest in effectively leveraging social media as an external knowledge source for more accurate pandemic forecasting.In 31 , the authors used tweet count (the amount of tweets related to COVID-19) per day as an additional input to an LSTM model and achieve better results than using statistics only.Taking a step further, in 12 , the collected tweets are then further extracted into two main features, representing user sentiment and topic of interest.These features are used as additional input features to an ARIMAX model, which is an extension of ARIMA.Furthermore, in 36 , important keywords are extracted and curated into a keyword cloud to present the most important information for each day and input to a MLP module for pandemic prediction.Perhaps the most relevant works to this paper are 16,17 where the authors extract the most popular keywords per day and view them as a graph structure and employ graph algorithms to learn on those representations.
In this study, in contrast to prior works, we incorporate data from multiple different sources, with social media as an important knowledge source where we build graph structure with each user as node, or an indicator on the current status of the epidemic, and dynamically represent the interaction between them through temporal graph neural networks 13,37,38 .Our approach comprehensively considers various aspects of user responses on social media and government regulations pertaining to the pandemic.

Temporal graph neural networks forecasting models
Graph neural networks (GNNs) have gained significant attention in various learning tasks, such as image recognition 39,40 , estimating quantum chemical computation [41][42][43] , predicting protein interfaces 44 , etc. GNNs generalize the concept of convolution neural networks to non-Euclidean domains, allowing for local operations on the nodes and edges of a graph 13,45 .The most popular GNNs is Message Passing Neural Networks (MPNNs) 42 in which the graph convolution is defined via the message passing scheme that propagates and then aggregates the vectorized information between each node and its local neighborhood.
To handle evolving features and connectivity over time, temporal graph neural networks have been introduced.Unlike static graphs, temporal graphs are usually represented by a sequence of node interactions over continuous time instead of an adjacency matrix.Temporal GNNs aim to capture both the temporal and structural information of the temporal graphs by introducing a node memory that represents the state of the node at a given time, acting as a compressed representation of the node's past interactions.Temporal GNNs combine graph encoding techniques with time-series encoding architectures such as LSTM and Transformers, forming a powerful deep learning framework.They find applications in various domains, such as traffic prediction, where they outperform traditional methods by incorporating spatial relationships of road networks and temporal dynamics of traffic conditions [46][47][48] .In the analysis of brain networks, temporal GNNs utilize invasive techniques like electrocorticography (ECoG) to uncover temporal patterns and gain insights into brain network dynamics 48 .
In our apporach, by leveraging the temporal and structural aspects of graph representation in social media data, temporal GNNs enhance modeling capabilities for understanding evolving complex systems and forecasting pandemic statistics.

Preliminaries Time-series forecasting
Originally proposed in 24 , Long short-term memory (LSTM) has been the dominant recurrent network architecture for learning from sequences of data.Unlike standard feedforward neural networks, LSTM can process and retain the temporal correlations between adjacent time steps, due to its feedback connections.For a historical time step t, the output y t will not only depend on x t but also from previous iterations through hidden state h t−1 and memory variable c t−1 : where Γ u and Γ f are "update gate" and "output gate", calculated through a sigmoid (σ ) activation function to determine the percentage of new memory ct to keep and the percentage of old memory c t−1 to forget, respectively.The "output gate" Γ o allows information to be revealed appropriately due to the sigmoid function then the weights are updated by the element-wise multiplication of Γ o and memory cell c t activated by the non-linear tanh function.
A simpler, more intuitive version of LSTM called Gated-Recurrent Unit (GRU) 49 , combined the cell memory and the hidden state variable into h t to transfer information.Therefore, a GRU only has two gates, a "reset gate" and an "update gate".
Finally, the last hidden state variable h t can be used to predict the corresponding output value ŷt through a fully connected layer with so f tmax activation function: Temporal graph learning algorithms Graph neural networks (GNNs) are a class of neural networks that operate on graph-structured data.Graphs are a powerful method to represent many types of data, such as social networks, biological networks, and traffic flows.GNNs are capable of learning the relationships between nodes in a graph.They generalize the concept of convolutional neural networks to 9/17 non-Euclidean domains by defining local operations on the nodes and edges of a graph.A typical GNN layer operate on input graph G = (X X X, E E E, A A A) can be formulate as in Equation 8.
where X X X ∈ R N×D X represents node matrix, each of the N nodes has D X features, and A A A ∈ R N×N is a weighted adjacency matrix encoding set of edges E E E. The graph convolution operator ⋆ can be approximated by first-order Chebyshev polynomial expansian and generalized to high-dimensional 45,50 with learnable parameter W W W ∈ R D X ×D Y .Temporal Graph Neural Networks are an extension of GNNs that can handle temporal graphs, i.e., graphs that change over time.Unlike static graphs, temporal graphs are usually represented by a sequence of node interactions over continuous time instead of an adjacency matrix.Temporal GNNs aim to capture both the temporal and structural information of the temporal graphs by introducing a node memory that represents the state of the node at a given time, acting as a compressed representation of the node's past interactions.In this work, we follow the framework of recent studies, including GCRN 37 , AGCRN 47 , and MPNN LSTM 15 , that utilize recurrent neural network on top of graph convolution operators.We leverage a simplified approach and use GRU as the recursive network architecture.
This framework allows MGL4MEP to learn the dynamic interactions between entities, act as nodes, or indicators to the current status of the pandemic and between different time stamps throughout the evolution of the pandemic.

Pre-trained Language models
Since we are dealing with free-text data to capture population's reactions to the pandemic, more specifically, user-generated content sourced from social media, it is crucial to extract meaningful information before constructing the graph-structured representation of the data.In recent years, large pre-trained language models have revolutionized the field of natural language processing such as Bidirectional Encoder Representations from Transformers (BERT) 51 .These models have demonstrated exceptional capabilities in understanding and generating human language.BERT's underlying architecture, based on Transformer 52 , employs self-attention mechanisms to capture dependencies between words or tokens in a sentence.This enables BERT to comprehend the contextual information of a word based on its surrounding words, leading to more accurate language understanding and representation.By pre-training BERT on massive amounts of textual data and a wide variety of tasks, such as masked language modeling and next sentence prediction, the model learns a rich language representation that can be fine-tuned for specific downstream tasks.
Building upon recent advancements in applying large pre-trained models to domain-specific data 53,54 , we leverage BertTweet 55 , a variant of BERT as our main feature extractor for text embeddings.The model has been trained on a large amount of Twitter data, especially including a sub-set of COVID-19 related data, and its effectiveness in capturing the nuanced meanings and signals conveyed in text data has been well-established.By leveraging BertTweet, we can obtain high-quality features that accurately represent the semantic content of social media posts surrounding the COVID-19 pandemic, and enables us to discover valuable patterns and trends that contribute to a comprehensive understanding of the social media landscape during the global health crisis.

Methodology MGL4MEP
In this section, we present the multi-modality framework and techniques employed in our study to effectively extract and model multi-modal data for COVID-19 forecasting.Our approach aims to harness the power of social media data, specifically user-generated content, and government stringency index, to gain valuable insights to the evolution of the pandemic.We describe the key components of our framework, including data pre-processing, feature extraction, graph-based representation with temporal graph learning, and multi-modal learning.The main components of our proposed framework are depicted in Figure 3.
One goal of our multi-modality framework is to effectively incorporate signals from human-generated text data through social media platform, offering valuable reflections of the population's response to the pandemic, as exampled in Figure 1.This importance can also be underscored by the sheer amount of Covid-19 discussion over time, strongly correlating with   pandemic statistics, as shown in Figure 4. Details about our social media data collection process is provided in the next Section.Extracting meaningful features from text data is crucial for constructing a comprehensive understanding of the information shared on social media platforms.While other works proposed another direction of extracting indices like sentiment, this might discard necessary information, such as sentiment about COVID-19, but not about current government response to the pandemic.As discussed in previous sections, by employing pre-trained language model, specifically BertTweet, as text feature extractor, we ensure to capture rich insights contained within user-generated content, especially in relation to COVID-19 pandemic.Our framework enables integrating various modalities to capture the complex and temporal dynamics of emerging pandemics.We obtain temporal embedding for each user by utilizing BertTweet on their text data as follows: x t,user i = ∑ j∈(t,user i ) BertTweet(post j ), (10)   where t denotes the timestamp, user i denotes the i-th user in our social media data, and post j , j ∈ (t, user i ) denotes the text obtained from the tweets of the i-th user at time t.The inclusion of user interactions and shared information across users is crucial for further analysis.In order to capture the correlations and dependencies between users, it is imperative to construct a graph-structured representation of the pre-processed data.This graph-based approach allows us to model the interactions and information flow through graph neural networks, capturing the dynamics and valuable insights related to the ongoing pandemic from social media signals.Hence, we introduce an end-to-end learning algorithm to discover the underlying graph structure that captures the correlation among time-series in a data-driven manner.More specifically, define node embeddings extracted from pre-trained language model as X X X G t := [x t,user 1 , x t,user 2 , . . ., x t,user N ] T ∈ R N×D X , and the continuous adjacency matrix can be calculated as the dot-product similarity matrix of the node embeddings: However, to enable effective learning with temporal graph learning algorithms, there are two downsides with this approach: first, large embedding dimension lead to incorrect adjacency matrix calculation.Second, may include information not directly related to our downstream task, and take up resources in training and evaluating.Inspired from AGCRN 47 , we employed a node-specific learnable embeddings that allows us to map input dimension to a lower intermediate embedding dimension: where g denotes the filter parameterized by W and E, while ⋆ denotes the graph convolution operator.E E E is a learnable intermediate node embedding matrix, E E E ∈ R N×D emb .The input node matrix is multiplied with the node embedding E E E, resulting in an updated representation, where E E E is learnable, meanings that the representation is specific for each node and its pattern.
Then, the integrated node embeddings are further multiplied by the weight matrix W W W to incorporate the influence of the node-specific features.Moreover, we replace the normalized graph Laplician matrix 50 by computing the inner product of the intermediate node embedding matrix E E E with its transpose E E E T .This operation captures the pairwise relationships between node embeddings and produces a matrix of shape N × N. We apply the rectified linear unit (ReLU) activation function to introduce non-linearity and ensure positive values in the resulting matrix.The softmax function is then applied to normalize the values across matrix row, ensuring that the row sums to 1.This step allows us to obtain a valid probability distribution representing the importance or relevance of each node, or each user, with respect to others.This graph convolution operation is plugged into the framework in Equation 9, and the final temporal graph learning algorithm is shown in Equation 12: where g denotes learnable weights with respect to different embeddings.To complement the multi-modal nature of our framework, we incorporate government stringency features that provide valuable insights into the pandemic response at a regional level.Government stringency features capture the level of restrictions, policies, and interventions implemented by authorities to mitigate the spread of COVID-19.These features serve as an important contextual signal to enhance the understanding of the evolving dynamics in our model.Specifically, we utilize the raw data and formula proposed in 56 to compute an indicator on the level of government stringency.However, recognizing the complexity of this domain, we compare and analyze each individual indicator, as well as the averaged general stringency index, to identify the most suitable indicator for the current pandemic situation.In Figure 5, we present the correlation levels between index and the number of new COVID cases for two indicators, with different time lags.The results suggest a strong relationship between the restriction on internal movement to the status of the pandemic.Hence, in the refined version of our framework, we leverage this specific indicator as a measure of government stringency.
Since this indicator can be represented as a vector for each day, similar to the statistical metrics of the pandemic, we can employ a sophisticated recurrent neural network (i.e., Equation 5 and Equation 6) to learn solely on this feature.Alternatively, we have the option to combine, or concatenate it with the pandemic statistics and learn through a unified recurrent network.Through extensive experimentation, we have found that the latter approach yields superior performance, and thus, it is our final choice for incorporating the government stringency indicator into our framework.
Finally, in order make accurate predictions, it is crucial to integrate the information from multiple modalities in our framework.We achieve this by fusing the embeddings obtained from different modalities, namely statistical features, government stringency features, and social-media graph-based features.The fusion process is performed using the equation: 12/17 where H H H {stat,reg} t+T is the learned embeddings of recurrent neural network for statistical metrics, ŷt+T represents the predicted value for the time step t + T , where T is the forecasting horizon.Embeddings from various domain, capturing the relevant information for each modality, are fused using the concatenation operator ⊕ to create a unified feature representation.
Using the aforementioned equation to integrate the embeddings from multi-modality, our system successfully combines a variety of information sources while utilizing the complementary nature of different modalities for enhanced forecasting performance.This comprehensive approach enables us to capture the intricate dynamics and interdependencies within the data, leading to more accurate and reliable forecasts for the future evolution of the pandemic.

Multimodal data collection process
In this study, as shown in Table 8, we utilized three different types of data sources to gain insights into the COVID-19 pandemic and its development.Daily user-generated contents from users with topics-of-interest related to COVID-19 • COVID-19 Statistic Data.We leverage the statistic dataset from Johns Hopkins University 57 with 450 data points from August 1, 2020, to November 30, 2021.Each data point is represented as the number of confirmed COVID-19 infections or serious, hospitalized cases in a given area per day.Our final task is time-series forecasting on this multi-variate statistics with different horizons to predict the trajectory of the pandemic.Then, trained models can be a valuable tool in responding to the pandemic, as it can support policymakers give better decisions about how to allocate resources, implement public health measures, and prepare for the future.
• COVID-19 Government Responses and Regulations Data.The stringency index data 56 is a valuable resource in understanding the level of government response to the COVID-19 pandemic.The index is represented as a numeric value between 0 to 100 and includes nine different indicators, such as the closure of schools and workplaces, cancellation of public events, restrictions on gatherings, and orders to shelter in place.Fig. 5 displays the correlation values between the stringency index and record restrictions on internal movement between regions and the daily statistics of new infected cases.Interestingly, both time-lag horizons exhibit a clear trend of correlation values peaking at around 30 days.Moreover, the correlation of record restrictions on internal movement with new infected cases is consistently higher than that of the stringency index.This is also observed when considering new hospitalized cases.The found correlation trends imply that the current government response can act as a valuable indication to forecast how the epidemic will develop in the future.
• Social Media Data.We crawl a total amount of more than 74 million tweets using Twitter API and tweets ids of all tweets related to COVID-19 released by Banda et al. 58 .The original authors leveraged Twitter Stream for collecting all tweets in the category of COVID-19 chatter, with over 4 million tweets a day.We filtered out tweets with geo-location tags in either California state or New York state in this exploratory study.Moreover, we filtered out all tweets that are not in English.We randomly keep all tweets from 1,500 different users for each location.The distributions of tweets over time with respect to statistics of newly confirmed cases are illustrated in Fig. 4. A strong correlation between the two time-series can be recognized, although there is a noisy period at the start.This is likely due to the initial confusion and fear surrounding the appearance of COVID-19, which led to a high volume of discussions about the virus worldwide.
As the situation became more stable and people gained a better understanding of the pandemic affection on their own regions, the amount of tweets posted became more relevant and had a higher correlation with users' areas of residence.
To account for this, we excluded data from the initial few months of the pandemic and only collected data starting from August 1, 2020.

Conclusion
In this work, we present a novel framework named MGL4MEP that combines temporal graph neural networks and multi-modal data for accurate pandemic forecasting.By integrating various big data sources, including social media content, we effectively capture the complex dynamics of emerging pandemics.Our framework outperforms traditional approaches by leveraging the potential of pre-trained language models and generating graph-structured data.Extensive experiments conducted with multiple variants of our proposed method demonstrate the effectiveness of our framework in providing timely and comprehensive insights into the pandemic landscape.The fusion of temporal graph learning and multi-modal data enables a deeper understanding of the evolving patterns and indicators, leading to more informed public health management and decision-making.Our approach offers a promising direction for leveraging big data in pandemic research and provides a foundation for future advancements in the field.

19 " 19 "Figure 1 .
Figure 1.Examples showing different stances on social media reacting to the pandemic and government regulations 6 .

Figure 2 .
Figure 2. Forecasting results on test set for the newly collected period of California state dataset (a) Infectious cases.(b) Hospitalized cases.

Figure 3 .
Figure 3.Overall architecture of MGL4MEP -a multi-modal framework for enhanced pandemic forecasting with external resources.MGL4MEP incorporates both pandemic related metrics and population's reactions on social media into the forecasting to better capture the dynamic properties of emerging pandemics.

Figure 4 .
Figure 4. Comparison between amount of tweets posted and number of new COVID-19 cases per day.(a) in California state.(b) in New York state.

Figure 5 .
Figure 5. Correlations between Stringency Index or Restrictions on internal movement to daily infected cases of COVID-19 across different time-lags.

Table 1 .
Results on California State for short-term predictions.

Table 2 .
Results on New York State for short-term predictions.

Table 3 .
Results on California State for long-term predictions.

Table 4 .
Results on New York State for long-term predictions.

Table 5 .
Ablation study on sufficient amount of social media users for modeling society interaction and factor.We train MGL4MEP SE with different number of nodes (users) for social network data and evaluate each model's performances on California data.

Table 6 .
Ablation study on sufficient amount of social media users for modeling society interaction and factor.We train our MGL4MEP SE model with different number of nodes (users) for social network graph and evaluate each model's performances on New York data.

Table 7 .
Results on California State when the data collection period went up to May, 2022.

Table 8 .
Input features used for proposed approaches